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Gridded estimated rainfall intensity values at very high spatial 
and temporal resolution levels are needed as main inputs for weather 
prediction models to obtain accurate precipitation forecasts, and to 
verify the performance of precipitation forecast models. These grid- 
ded rainfall fields are also the main driver for hydrological models 
that forecast flash floods, and they are essential for disaster predic- 
tion associated with heavy rain. Rainfall information can be obtained 
from rain gages that provide relatively accurate estimates of the ac- 
tual rainfall values at point-referenced locations, but they do not 
characterize well enough the spatial and temporal structure of the 
rainfall fields. Doppler radar data offer better spatial and temporal 
coverage, but Doppler radar measures effective radar reflectivity (Ze) 
rather than rainfall rate (R). Thus, rainfall estimates from radar data 
suffer from various uncertainties due to their measuring principle and 
the conversion from Ze to R. We introduce a framework to combine 
radar reflectivity and gage data, by writing the different sources of 
rainfall information in terms of an underlying unobservable spatial 
temporal process with the true rainfall values. We use spatial logistic 
regression to model the probability of rain for both sources of data in 
terms of the latent true rainfall process. We characterize the different 
sources of bias and error in the gage and radar data and we estimate 
the true rainfall intensity with its posterior predictive distribution, 
conditioning on the observed data. Our model allows for nonstation- 
ary and asymmetry in the spatio-temporal dependency structure of 
the rainfall process, and allows the temporal evolution of the rain- 
fall process to depend on the motions of rain fields, and the spatial 
correlation to depend on geographic features. We apply our methods 
to estimate rainfall intensity every 10 minutes, in a subdomain over 
South Korea with a spatial resolution of 1 km by 1 km. 
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1. Introduction. Precipitation is a key component that links the atmo- 
sphere, the ocean and the Earth's surface through complex processes. Thus, 
accurate knowledge of precipitation levels is a fundamental requirement for 
improving the prediction of weather systems and of climate change. In par- 
ticular, some of the weather forecast models use data assimilation techniques 
that require accurate precipitation estimates at high spatial and temporal 
resolutions. Accurate rain maps are also used to verify the performance of 
precipitation forecast models and can pinpoint the shortcomings of numeri- 
cal models. From a hydrological point of view, the rainfall is a main driver 
in predicting stream flow. Accurate estimated spatial and temporal struc- 
tures of precipitation are the most important component to understand and 
predict flash flood. Rain gages are widely used to measure rainfall accu- 
mulation, but the information they provide is limited by their spatial and 
temporal resolution. In particular, the deployment of dense rain gage net- 
works to resolve detailed spatial structure of rain fields is costly and their 
maintenance is time-consuming. 

Remote sensing technology has improved significantly over the last few 
decades and can now provide quantitative information on precipitation. 
Doppler radars estimate instant rainfall intensity or rate (R) at very high 
spatial and temporal resolution (typically, 1 km by 1 km and in the order 
of minutes). However, a radar does not measure rainfall rate directly but 
infers the rain rate from the measured effective radar reflectivity (Ze). The 
conversion from radar reflectivity Ze to rainfall R is usually done with the 
transformation Ze = aR? , where a = 200, f3 = 1.6 [e.g., Lee and Zawadzki 
(2005)]. This conversion function is typically derived from the measured 
drop size distributions (DSDs). The DSDs vary with different microphysical 
and dynamical processes. There is no unique conversion equation that sat- 
isfies all different processes, rather varies from storm to storm and within 
storm [e.g., Lee and Zawadzki (2005)]. The error associated with the R-Ze 
conversion is inherent. Thus, the main source of error of the radar rainfall 
estimates are associated with reflectivity measurement errors and R-Ze con- 
version errors. Many studies have attempted to correct for these two types 
of errors [e.g., Zawadzki (1984), Joss and Waldvogel (1990), Jordan, Seed 
and Austin (2000), Germann et al. (2006), Lee and Zawadzki (2005), Bellon 
et al. (2006)]. 

Even when these conversion errors are corrected based on an understand- 
ing of the physical processes and the Ze measurement errors are reduced, 
the radar data are not directly comparable to rain gage measurements due 
to the differences in sampling volumes between the two different sensors. A 
rain gage accumulates rainfall at a point on the ground, while radar sam- 
ples at a volume of approximately 1 km 3 at some height above the ground. 
Rain gages are also relatively sparse across space (in relation to the spatial 
resolution of the radar data), thus, it is difficult to infer small scale rainfall 
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spatial patterns from gages. Ideally, one would want to combine both sources 
of rainfall data to obtain more accurate rainfall estimates. This paper in- 
troduces a statistical framework for combining radar reflectivity and gage 
measurements to obtain estimates of rainfall rate, taking into account the 
different sources of error and bias in both sources of data. We also introduce 
a sound statistical framework to estimate the R-Ze conversion equation as 
a spatial function. In some of the previous work [e.g., Chumchean, Seed and 
Sharma (2004)] the relation between radar and gage data is only modeled 
when it rains, eliminating the zero-rain observations. In our framework, we 
use all available data, including the zero-rain events. Furthermore, we use 
spatial logistic regression to model the probability of zero-rain for gage and 
radar data as a spatial process. 

Statistical models have been developed to estimate the distribution of the 
length of wet and dry periods of rainfall patterns [Green (1964), Hughes 
and Guttorp (1994), Sanso and Guenni (2000), Zhang and Switzer (2007)]. 
In particular, the stochastic model introduced by Zhang and Switzer (2007) 
describes regional-scale, ground-observed storms by using a Boolean random 
field of rain patches. However, mesoscale instantaneous rainfall patterns are 
very difficult to model using statistical or physical models, in part due to 
lack of real-time weather stations measuring rainfall intensity (the Korean 
network is quite unique in this regard), and because of the spatial and tem- 
poral heterogeneity and the inherent variability of rainfall process at high 
resolution in space and time. This study is aimed to develop a statistical 
model that can be implemented in real time for the estimation of rainfall 
intensity maps at high resolutions (1 km in space and 10 minutes in time). 
Our model treats the true unobservable rainfall intensity as a latent process, 
and we make inference about that process using a statistical framework that 
relates radar and gage data and other weather and geographic covariates to 
the latent process. 

Rainfall intensity changes rapidly across space and time, in particular, 
depending on the direction and intensity of the winds. Estimating com- 
plex spatial temporal dependency structures can be computationally de- 
manding, since it involves fitting models that go beyond the standard as- 
sumptions of stationarity and full symmetry of the geostatistical models. 
The assumption of stationarity and full symmetry for spatial-temporal pro- 
cesses offers a simplified representation of any variance-covariance matrix, 
and consequently, some remarkable computational benefits. Suppose that 
{Z(s, t) : s = (si, S2, ■ ■ • , Sd)' GDc M. d , t G [0, oo)} denotes a spatial-temporal 
process where s is a spatial location over a fixed domain B, R rf is a d- 
dimensional Euclidean space and t indicates time. The covariance func- 
tion is defined as C(sj — sj; tk — U\0) = covg{Z(si,tk), Z(sj,ti)}, where Sj = 
(s\, . . . , s d )' and C is positive-definite for all 6, vector of covariance param- 
eters. Under the assumption of stationarity, we have C(sj — s,;i& — ti\9) = 
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C(h; u\9), where h = (hi, . . . , hd)' = s, — Sj and u = tk —ti- Nonstationary 
models have been introduced by Sampson and Guttorp (1992), Higdon, Swall 
and Kern (1999), Nychka, Wikle and Royle (2002) and Fuentes (2002), 
among others. Another assumption of the commonly used spatial models 
is full symmetry. Under the assumption of full symmetry [Gneiting (2002)] 
we have C(h; u\0) = C(h; -u\0) = C(-h;u\0) = C(-h; -u\0). In the purely 
spatial context, this property is also known as axial symmetry [Scaccia and 
Martin (2005)] or reflection symmetry [Lu and Zimmerman (2005)]. Stein 
(2005) introduced asymmetric models for spatial-temporal data. Park and 
Fuentes (2008) have introduced this concept of lack symmetry in a more gen- 
eral context for nonstationary space-time process and they have developed 
asymmetric space-time models. However, these nonstationary and asym- 
metric models are computationally demanding. In our framework, we allow 
enough flexibility in the potentially nonstationary and asymmetric spatial- 
temporal evolution of the rainfall process using a computationally efficient 
approach. We introduce a latent displacement vector that explains the echo 
motion of the rainfall intensity, we use motion field information to explain 
this shift of the rainfall across space and time. 

We apply our methods to estimate rainfall intensity at 4:20 am and 
4:30 am on July 12, 2006, for a domain over South Korea with a spatial 
resolution of 1 km by 1 km. The temporal resolution is limited by the up- 
date of the radar data (every 10 minutes). At each given time we have 102 
observations from the gage network and 10,000 radar reflectivity data points. 

This paper is organized as follows. In Section 2 we describe the data. 
In Section 3 we introduce the statistical framework to combine radar and 
gage rainfall data. In Section 4 we apply our model to rainfall data in South 
Korea. We conclude in Section 5 with a discussion and some final remarks. 

2. Data. 

Radar data. The Korean Meteorological Administration (KMA) oper- 
ates 11 Doppler weather radars (4 C-band and 7 S-band) and collects radar 
volume scans every ten minutes. In this study we use 10 minutes radar data 
for July 12, 2006 over a subdomain in South Korea at 4:20 am and 4:30 am. 
The constant altitude Plan Position Indicators (CAPPI) of equivalent radar 
reflectivity is constructed from each volume scan at 1.5 km height and a 
temporal and spatial resolution of 10 minutes and 1 km, respectively, af- 
ter eliminating ground clutter (GC) and anomalous propagation (AP). The 
reflectivity composite is produced from individual radar CAPPIs with an 
array of 901 by 1051 at the spatial and temporal resolutions of 1 km and 
10 minutes. The weather radars at short wavelengths such as the C-band 
(~5 cm) suffer from serious attenuation of signals due to strong precipita- 
tion over a radar or along the path of the radar beams. In addition, radar 
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measurements are limited by the blockage of radar beams by complex ter- 
rain over the Korean peninsula. The correction of the attenuation and beam 
blockage could be obtained with complex algorithms that are very compu- 
tationally demanding. Here, we use a standard approach and we select the 
maximum value at the overlapping grids to mitigate the attenuation and 
beam blockage. 

The radar composite in South Korea is re-sampled at the smaller domain 
of 100 km by 100 km around Seoul, South Korea, with latitude ranging 
from 36.977 to 37.898 and longitude ranging from 126.173 to 127.33. This 
subdomain particularly provides a dense network of rain gages (see Figure 1). 
There are two radar stations within this subdomain, at locations (37° 26', 
126°57') and (37°27', 126°21') (see Figure 4). The radar on the left-hand 
side in Figure 4 is C-band so its measurements are affected by attenuation 
errors. Some of the radar data on the right upper and lower areas of our 
subdomain are obtained from additional S-band KMA radars located outside 
our geographic subdomain and, thus, are not shown in Figure 4. 

The radar precipitation map is typically generated by converting radar 
equivalent reflectivity (Ze) into rainfall intensity [R, units = (mm/li)] us- 
ing the relationship Ze = aR 13 , a is generally set to 200 and j3 to 1.6. The 
conversion can result in some error associated with the variation of micro- 
physical processes. The conversion errors vary in time and space [Lee and 
Zawadzki (2005), Lee et al. (2007)]. In this paper we work directly with the 
reflectivity Ze and we estimate the values of a and f3, treating a as a spatial 
process. 

Station data. The Korean automatic weather stations (AWS) measure 
the precipitation accumulation, the east-west component of the wind vector 
(m/s) (u wind component), the north-south component of the wind vector 
(m/s) (v wind component), temperature (°C) and relative humidity (%) 
every minute. The bucket size of rain gage is 0.5 mm and the number of tips 
is recorded every minute from which the 1-minute rainfall intensity (mm/h) 
is derived using a standard approach called the Tropical Rainfall Measuring 
Mission (TRMM)/Gauge Data Software Package (GSP) algorithm [Wang, 
Fisher and Wolf (2008)]. This algorithm first identifies the rain event by 
checking the time interval between two tips and fills the gap between two 
tips by adding a half-tip. Then, the cumulative distribution of the amount of 
rain is derived as a function of time (using cubic splines) and the 1-minute 
rainfall intensity is obtained from the slope of this function. To minimize the 
random measurement noise, a 10-minutes moving average centered at the 
given time is applied, and then the averaged values at the radar measurement 
time are re-sampled. Zero values represent no-rain but they can also occur 
by malfunctioning gages. The obtained values are point measurements so 
representative errors need be characterized. In this study we use the weather 
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Fig. 1. Radar reflectivity and gage data (mm/h) on July 12, 2006 at 4:20 am and 4:30 
am. The background color is the reflectivity, the gage values are represented with 3 different 
symbols: circle — no rain, square — 0-20 mm/h, triangle=more than 20 mm/h. 
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stations over a subdomain in South Korea with latitude values ranging from 
36.977 to 37.898 and longitude values from 126.173 to 127.33, for July 12, 
2006. The total number of weather stations in our subdomain is 102. The 
climate weather stations in the U.S. record rainfall accumulation every hour 
and there are only few stations very sparse across space that record real-time 
meteorological data. Some special stations transmit high temporal resolution 
in real time. However, all station data used in this study are transmitted 
every minute in real time. The average spatial resolution is about 13 km 
and even higher around large cities (few kilometers in some parts of our 
subdomain). In addition, the weather system in South Korea has complex 
mesoscale features due to interactions with mountains and oceans. Thus, 
these station data provide a unique opportunity to explore the small scale 
structure of rain fields. 

Elevation data. Geographical data are useful to understand the modu- 
lation of precipitation systems by surface conditions, in particular, in Korea 
where weather systems interact with complex geography. In this study we 
use Digital Elevation Model (DEM) data. The DEM elevation data (in me- 
ters above sea level) have resolution of 1 km and are available from the US 
Geological Survey. The elevation increases as we move west-to-east in the 
eastern part of our subdomain. Thus, this phenomenon could lead to more 
beam blockage for the radar data in these areas, which is compensated by 
selecting maximum values for the overlapping radar data. 

3. Model description. Let R(s,t) be the continuous spatial temporal 
process with the observed precipitation values at location s and time t as 
measured by the rain gages, and Ri(t) the radar reflectivity data for grid 
cell i and time t. Ri(t) represents the continuous component (when it rains) 
of the underlying true instant rainfall process at time t averaged within grid 
cell i. We model the latent rainfall process Ri{t) in the natural logarithm 
scale, and we denote Yi(t) = log(i?.;(t)). The logarithm transformation is 
preferred since in the original scale too much attention is given to strong 
rainfall rates [Lee and Zawadzki (2005)]. 

In our framework we have 2 main stages. This framework is fitted using 
a fully Bayesian approach. In stage 1 we model the different sources of data 
in terms of an underlying latent rainfall process, and in stage 2 we describe 
the model for the latent process given some weather covariates. We add a 
stage to model the weather covariates; this stage is implemented outside 
the fully Bayesian framework for computational convenience. 
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Stage 0. 

Covariates. In our framework the weather variables, such as tempera- 
ture, relative humidity, and wind fields are important covariates to explain 
the latent rainfall process and its temporal evolution. However, these weather 
covariates are not observed at all locations of interest for rainfall prediction 
within our 1 km by 1 km resolution gridded domain. Therefore, we do spatial 
interpolation using thin plate splines. The number of basis is chosen using 
generalized cross-validation (GCV) [Craven and Wahba (1979)]. 

Stage 1. 

Gage data model. We model the gage data in terms of an underlying 
true rainfall process (averaged within each grid cell) with some error, and 
we model the probability of zero-rain for the gage data as a spatial process 
TT g (s,t) also in terms of the latent rainfall process. We call this model a 
zero-inflated log-Gaussian process (LGP). Thus, we have 



R(s,t) 



0, Kg(s,t), 
exp{y i(s )(t) +e g (s,t)}, l-7T g (s,t), 



with 7T g (s,t) the probability of zero rain at location s and time t for the rain 
gage data. The subindex i(s) for the process Y with the log rainfall values 
refers to the grid cell containing s. The component e g (s,t) characterizes 
the variations in the gage data with respect to the truth at location s and 
time t, due to measurement error and also due in part to the temporal and 
spatial misalignment (different scales) of the gage data and the truth. The 
misalignment is a consequence of the temporal averaging of the gage data 
to obtain R(s,t) and the spatial averaging of the truth. This variation is 
expected to be proportional to the precipitation levels that is why we have 
multiplicative errors in the original scale. 

The logits of the unknown vr(s,t) probabilities (i.e., the logarithms of 
the odds) are modeled as a linear function of the rainfall values (log scale) 
^i(s)(*)> using a spatial logistic regression model, 

logit(7r 9 (s,i)) =a g + bgY i{s) (t), 

where a g and b g are unknown coefficients. 

In our zero-inflated LGP we model the probability of zero rain in terms 
of the continuous component of the rainfall, Y^(t), and thus, it differs 
from most zero-inflated models that treat the probability of zero and the 
continuous part independently. This is an important feature of our model 
that allows to smooth some unrealistic nonzero values (due to measurement 
error) in the middle of a storm. 
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Radar reflectivity model. For the radar reflectivity data we have a simi- 
lar model, except for the fact that we add multiplicative and additive bias 
components to the reflectivity-rainfall conversion model, 



with n r (i,t) the probability of zero-rain for the radar data at time t and 
grid cell i. The above relationship [when Ri(t) > 0] is the reflectivity-rainfall 
conversion function, 



with the parameter c\ a spatial function (additive bias term), c% a multi- 
plicative bias term and e r a random error component. We tried to model 
the parameter C2 as a spatial function too, but that led to a lack of iden- 
tifiability problem. The reflectivity-rainfall conversion function is modeled 
in the natural logarithm scale, because the error component is thought to 
increase with the precipitation values, and in the original scale too much 
weight is given to high rainfall rates [Lee and Zawadzki (2005), Germann 
and Zawadzki (2002)]. The logits of the unknown ir r (i,t) probabilities are 
modeled as a linear function of the Yi(t), 



where a r and b r are unknown coefficients. Modeling the probability of zero 
rain in terms of the rainfall continuous component becomes very relevant for 
the radar measurements, since in many occasions we obtain isolated zero- 
rain events due to radar measurement error in the middle of a storm (see 
zero-rain radar data in the middle of our geographic domain in Figure 1). 



Latent rainfall process. The latent rainfall process (log scale) Yi(t) is 
modeled as a spatio-temporal Gaussian Markov random field. The pro- 
cess at the first time point Y(l) = (Yi(t), . . . , Y n (t))' has a conditionally 
autoregressive prior (CAR) prior. The CAR prior for Y(l) is a multivari- 
ate normal with a mean X(l)/3, for X(l) = (X\(l), . . . ,X n (l))', that is a 
linear function of temperature, relative humidity and elevation at time 1, 
and with inverse covariance t y Qy, where r Y ~ Gamma(0.5, 0.005) is an un- 
known precision, and Qy = {I — pyDW), where py G (0, 1), I is a diagonal 
matrix with la = 1, W ik = w ik >0 for i ^ k and Wu = w i+ = J2 k ^iW ik , 
and D = Diag(l/wi+) is a diagonal matrix with Da = l/wi+. An attractive 
feature of this prior is that the full conditional prior Yi(l)\Y k (l),k ^ i, is 
normal with mean py{^2 k ^ i ^-(Y k {l) + Xi{l)j3)) and variance l/(r 2 u>i+). 

We assume the weights Wj k are known and equal to Wj k = I(j ~ k), where 
J(j ~ k) indicates whether cells j and k are adjacent. 




0, 

exp{ci(i) + c 2 Yi(t) + e r (i,t)} 



7T r (i,t), 

1 — TT r (i, t) 



log{Ri(t)) = Cl (i) + c 2 log{Ri(t)) + e r (M) 



logit(7r r (i, t)) = a r + b r Yi(t) 



Stage 2. 
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Temporal evolution of the latent rainfall process. Successive values of 
Yi(t) are modeled using a dynamic linear model [Gelfand, Banerjee and 
Gamerman (2005)]. The usual dynamic linear model smoothes Yj(t) toward 
the cell j's previous value, Yj(t — 1), that is, 

(1) Y i {t) = pY i {t-l)+X i {t)(5 + e l {t), 

where p 6 (0, 1) controls the amount of temporal smoothing, €i(t) is a multi- 
variate normal vector with CAR prior, and with inverse covariance T^Q e ^ 
[similar to the CAR model for Y(l), but with mean zero], Xi is a vector 
with the following covariates: elevation and the temporal gradients of tem- 
perature and relative humidity. However, this is inappropriate for rain data 
because the storm moves over time. To correctly model the storm dynamics, 
we introduce a displacement vector Aj(t) that explains the echo motion at 
the grid point i. The model then becomes 

(2) Yi(t) = P Y i+Al{t) (t - 1) + Xi(t)0 + €i(t), 

if the shift A at a given time is not a function of space that would impose 
one constant translation vector, and it would not allow for rotation. To allow 
enough flexibility in the temporal evolution of the rainfall process, we model 
A as a spatial categorical process and we use the wind field information (it 
and v components) (Figure 2) as relevant covariates to explain the shift. We 
write W Ut i(t) and W v ,i(t) to denote the u and v components of the wind 
at time t and location i. S(t) represents a spatial spline basis function at 
time t [number of basis components is chosen using DIC (deviance informa- 
tion criterion), Speigelhalter et al. (2002)] at time t, and a, fa and fa are 
unknown coefficients. The basis function for S(t) are the tensor product of 
two 1-dimensional cubic B-spline basis functions. This type of basis func- 
tions was chosen for computational convenience, while still providing enough 
smoothing for the term S(t). We used information criteria for the selection 
of number of basis and we ran sensitivity analysis. 
We have 

A i (t) = ([S 1;i (t)],[6 2>i (t)]), 
where [x] rounds to the nearest integer to x, and 

S 1>i (t) = aW u , i (t) + S(t)fa 

and 

5 %i {t) = aW V! i{t) + S(t)fa. 

We introduce a spline basis function to characterize the behavior of the 
storm motion not explained by the wind fields. The wind fields (Figure 2) 
can be very noisy at this high spatial-temporal resolution, so additional 
smoothing is needed and that is the role of the spline basis functions. 
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Fig. 2. Wind fields on July 12, 2006 at 4:20 am. The arrow points in the direction the 
wind is moving toward and the length of the stem of the arrow is proportional to wind 
speed. 

We estimate the rainfall intensity, Ri (t) , and the probabilities of zero-rain 
for the gage and radar data, 7r g (s,t) and ir r (i,t), with their predictive pos- 
terior distribution summaries. We fit this framework using a fully Bayesian 
approach. MCMC sampling is carried out using the software R [R Devel- 
opment Core Team (2006)]. Metropolis sampling is used for all parameters. 
The standard deviations of the Gaussian candidates are tuned to give accep- 
tance probabilities near 0.40. Convergence is monitored by inspecting trace 
plots of the deviance and several representative model parameters. 

3.1. Covariance function of the rainfall intensity. It is of interest to cal- 
culate the spatial-temporal covariance of the latent rainfall process, as a 
measure of the dependency structure and variability across time and space 
of the true underlying rainfall process. The process Y(l) has inverse co- 
variance TyQy (defined in the previous section). Conditioning on the shift 
vector, the covariance of Y (log of the rainfall intensity) is 

cov(Y t {t),Y i+h {t + T)) 
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\ k=2 k'=2 / 

+ E p 2t - 2i+2+T ^h)Q^i) ( i + E \ (*); < + h + E ^ (a/)) 

3=2 V fc=j jfc'=j / 

+p t t#)Q$ + E ^ (fc')Y 
\ k'=t+l I 

where p is the smoothing parameter used in equation (2), Q^~ l \i,j) denotes 
the element (row i, column j) of the inverse matrix of Q, and the 

subindexes if, and i\ for A(/c) are functions of the location i and values of 
A at other time points than k. When the shift is constant across space, the 
covariance above becomes a mixture of CAR covariance models. However, 
by allowing the shift to change across space depending on the wind fields 
and geographic features, the covariance becomes space-dependent and shows 
different patterns across space (nonstationarity). 

Thus, by using latent processes, we allow the spatial dependency of the 
log instant rainfall process, YJ, to depend on location, rather than just being 
modeled as a function of distances between grid cells (stationarity assump- 
tion) . From the previous expression for the covariance of the process Yi , we 
see that the temporal evolution of the rainfall process at different locations 
is a function of the shift vector with the wind fields, allowing for the lack of 
full symmetry in the space-time covariance. 



4. Application. The relatively dense spatial coverage of the Korean au- 
tomatic weather stations with 1-minute weather measurements, combined 
with the high spatial resolution of the Doppler radar data from KMA, pro- 
vide a unique opportunity to study the radar reflectivity-rainfall conversion 
function, and to develop frameworks for spatial temporal mesoscale mod- 
eling of rainfall intensity. Lee, Seed and Zawadzki (2007) modeled the spa- 
tial/temporal variability of the conversion errors by using reflectivity and 
rainfall rate measurements at a fixed location. In the work by Lee, Seed 
and Zawadzki (2007) the conversion errors were not modeled across space 
and time due to the limited observations. In the work presented here we 
introduce a statistical model that can capture the complex and heteroge- 
neous spatial temporal structure of mesoscale rainfall intensity combining 
gage and radar reflectivity data with a resolution of 10 minutes (time) and 
1 km xl km (space). Figure 1(a) presents the radar reflectivity (Ze) and 
gage rainfall rate (R in mm/h) on July 12, 2006 at 4:20 am. The empirical 
Pearson correlation between log radar reflectivity and log gage data is 0.5. 
There is an overall agreement between both sources of information in terms 
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Table 1 

We present the DIC value, the posterior expectation of the deviance (D) and the 
estimated effective number of parameters pd, for 5 different models. Model 1: shift and 

bias constant across space. Model 2: shift varying spatially and bias constant across 
space. Model 3: shift constant across space and bias varying spatially. Model 4: shift and 
bias varying spatially (9 spline basis components). Model 5: shift and bias varying 
spatially (25 spline basis components) 



Model 


DIC 


D 


Pd 


1 


18700 


13092 


5607 


2 


18223 


12535 


5688 


3 


19247 


13377 


5870 


4 


17722 


12020 


5702 


5 


18813 


12436 


6377 



of capturing the large scale structure. However, there are discrepancies at 
some given locations. Similar empirical correlations are obtained at other 
time points. 

Figure 1(b) presents both sources of information on July 12, 2006 at 
4:30 am (the next time point). There is a storm moving from the south- 
western to the north-eastern part of our domain, and both sources of data 
seem to capture that phenomenon, showing, however, small scale disagree- 
ments in the intensity of the rainfall. We use the statistical framework intro- 
duced in this paper to combine radar reflectivity and gage data to estimate 
rainfall intensity. 

To determine and justify the need of the more complex model proposed 
in this paper, in which we introduce a spatially varying bias function (ci), 
and a spatially varying shift vector (A), we compare 5 different models that 
assume different spatial structure for the additive bias and the shift vector. 
In Model 1 we present a simple model, where the shift and bias are constant 
functions across space. In Model 2 the shift is varying spatially (using the 
model in Section 3) and the additive bias is constant across space. In Model 3 
the shift is constant across space and the bias is varying spatially. In Model 4 
the shift and bias are varying spatially this is the general model presented in 
Section 3, using 9 basis components for the cubic B-spline function (tensor 
product of two 1-dimensional cubic B-spline basis functions). Model 5 is 
the same as Model 4 but with a different number of components for the 
spline function (25 basis components). In Table 1 we present some model 
comparisons. 

We use the DIC [deviance information criterion, Speigelhalter et al. (2002)] 
to compare model performance. Pd indicates the estimated effective num- 
ber of parameters. Model 3 has larger Pd than Model 4, even if Model 4 
is a more complicated model. The reason for that is that Model 4 has a 
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spatially-varying shift, while Model 3 does not, and adding a small number 
of shift parameters aligns the two time points in a way that allows for more 
temporal smoothing and thus fewer effective random effects. Model 4 has 
the smallest DIC, and we give next a summary of the results for that model. 

The prior used for the multiplicative bias (parameter C2 in the reflectivity- 
rainfall conversion equation) is a Gamma(l,l). The posterior median (95% 
interval) for this multiplicative bias is 1.05 (1.08, 1.13), which seems to in- 
dicate that the recommended 1.6 value for the multiplicative bias (obtained 
based on regression analysis) might not be always appropriate. In Figure 3 
we plot the rainfall gage data versus the reflectivity radar data for our sub- 
domain on July 12,2006 at 4:20 am and 4:30 am, eliminating the no-rain 
events; we also show the standard conversion curve (Ze = 200it! 1 ' 6 ). The 
plot of the spatial posterior median for the spatial bias [parameter exp(ci)] 
is presented in Figure 4. The bias is larger in the south-eastern part of our 
subdomain, where there seems to be more disagreement between radar and 
gage data. This figure also shows the location of the two radar stations. 
There is not much relationship between the location of the radar stations 
and the magnitude of bias for the radar data. It is true that as the radar 
beams propagate through the range of coverage of the Doppler radar, the 
sampling volume and measurement heights increase. In addition, at larger 
distances from the station the radar beams can intercept the melting layer 
of the stratiform rain. Intercepting the melting layer could result in high val- 
ues of radar reflectivity, mainly due to the increase of the dielectric constant 
while maintaining the size of the individual precipitation particles. This phe- 
nomenon could increase the radar bias at locations that are further away 
from the station. However, the current storm is mostly a convective system, 
where the increase of reflectivity in the melting layer is less significant or 
it is not present. Thus, a larger bias is not expected further away from the 
radar stations. 

The estimated values of C2 were very similar for the other 4 models, which 
seems to indicate that this parameter is robust to the structure imposed on 
the additive bias and the shift parameter. The number of basis components 
for the spline surface is 9 for both the additive bias and also for the shift 
vector. Models with a different number of spline basis components had higher 
DIC values (see, e.g., Model 5). 

We present in Figure 5 the vector that shows the storm motions (A,(t)). 
The values presented are the mean of the posterior distribution of the storm 
direction. We can appreciate in this graph the flexibility offered by our sta- 
tistical framework to characterize nonstationary and complex patterns in 
this motion. The estimated displacement vector shows the dominant south- 
west to north-east direction of the storm, but it also captures some small 
scales phenomena, such as the west-east shift in the lower left corner of our 
domain. The average shift is about 11 m/s. The median (95% interval) for 
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Fig. 3. PZot o/ t/ie rainfall data from rain gages, versus radar reflectivity, eliminating 
the zero-rain events, and using data from July 12, 2006 at 4-'20 am and 4:30 am. The 
solid curve corresponds to the standard transformation Ze = 200R . This scatterplot is 
generated in the natural logarithmic scale for both variables (log of rainfall versus log of 
reflectivity), presenting the reflectivity radar and gage data on the original scale on the 



the coefficient a that measures the effect of the smoothed wind fields on 
the shift is 0.39 (0.22, 0.55), which indicates that the smoothed wind plays 
an important role explaining the storm motion (Figure 5). This might be 
expected, since storm systems move along the wind fields at steering levels 
(around 700 to 500 hPa). However, it is also true that the surface winds 
(Figure 2) are highly affected by local effects such as terrain and local heat- 
ing. On the other hand, the smoothed version of the wind fields used in this 
application should attenuate the impact of those local phenomena, and be 
a better predictor of the shift. 

It is also of interest to study the probability of rain for both gage and 
radar data. We used vague normal prior distributions for the logistic pa- 
rameters, N(0, 1/0.01), where 0.01 is the precision. The median (95% in- 
tervals) for the logistic parameters that explain the probability of zero-rain 
for the radar data are —2.81 (—3.08, —2.61) for the intercept parameter, 
and —1.69 (—1.83, —1.58) for the slope parameter. For the gage data, we 
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Fig. 4. Spatial bias surface for the reflectivity radar data at 4-20 am on July 12, 2006 
/exp(ci) parameter]. The two X marks in the plot indicate the locations of the radar 
stations within our subdomain. 

obtain for the intercept —0.23 (—0.68, 0.40), and for the slope parame- 
ter —0.17 (—0.41, 0.06). Both slopes are negative, since higher rain corre- 
sponds to less probability of no rain. There are many less gage observa- 
tions than radar data, so the gage intervals are much wider. In fact, the 
95% interval for the slope covers one. Figures 6(a) and (b) show the prob- 
abilities of no rain for the radar and gage data. As expected, in the cen- 
ter of our domain near the center of the storm the probability of rain is 
very high, but there are other areas with high probability in the north- 
western part of our domain. The probability of no-rain varies much less 
across space for the gage data; this may be caused by a higher rate of 
erroneous zero rain values in the gage data than in the radar data. The 
probability maps illustrate one of the main features of our model, which is 
the ability to correctly estimate the probability of zero-rain, despite some 
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Fig. 5. The mean of the posterior distribution of the shift vector on July 12, 2006 at 
4:30 am. 

incorrect zero-rain measurements in radar and gage data in the middle of 
the storm. 

We present our rainfall maps in Figures 7(a) and (b). These figures present 
the mean of the predictive posterior distribution (ppd) of the rainfall inten- 
sity, Zi(t), on July 12, 2006 at 4:20 am and at 4:30 am, respectively. These 
graphs, overall, present smoother surfaces but with similar rain patterns as 
the radar reflectivity images (Figure 1), except for the areas with larger bias 
(see bias function in Figure 4). However, the scale is different, the estimated 
reflectivity-rainfall function provides the change of scale equation. 

We redid the analysis setting the gage values that were thought to be 
erroneous as missing rather than zero, which is the default value assigned by 
the recording instrumentation at the gages. An empirical standard approach 
was used to identify erroneous gage values based on ad-hoc comparison of the 
gage data with the 9 closest radar pixels. When the number of nonzero rain- 
value pixels was greater or equal than 3, then the zero-rain gage value was 
replaced with a missing label. The obtained additive bias, shift vectors, and 
predictive rainfall values [see Figure 7(c) and (d)] were very similar to the 
ones obtained using the potentially erroneous values. This is due to the fact 
that in our model we characterize the uncertainty in the gage measurements, 
and we combine radar and gage data, so clear erroneous values should be 
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Fig. 6. Graphs (sl) and (b) present the estimated probabilities of no-rain for the rain 
gage and radar data on July 12, 2006 at 4:20 am. 




detected in our model, which is one of the nice features of our proposed 
approach. 

4.1. Calibration. Data from 10% of the gage observations on July 12, 
2006 between 4:20 am and 4:30 am are left out for validation. These ob- 
servations are left out throughout the model fitting. Similarly, 10% of the 
radar reflectivity data on July 12, 2006 between 4:20 am and 4:30 am are 
left out for validation. We repeat this calibration analysis 10 times. Thus, we 
randomly remove 10% of the observations ten separate times and compute 
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Fig. 7. Graphs (a.) and (h) present the mean of the predictive posterior distribution 
of the rainfall intensity (mm/h) on July 12, 2006 at 4:20 am and 4:30 am, respectively. 
Graphs (c) and (d) also present the mean of the predictive posterior distribution of the 
rainfall intensity (mm/h) on July 12, 2006 at 4:20 am and 4'-30 am, but, treating what 
were thought to be erroneous zero-rain gage values as missing rather than zero. 

the coverage probabilities of the 95% prediction intervals for each of the ten 
analyses. 

Figure 8 shows the validation plot for the radar data, for one of the 10 
analyses conducted. The predictive reflectivity radar values are 97.4% of the 
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Fig. 8. Validation plot at randomly selected across space and time hold-out radar data 
locations on July 12, 2006 between 4:20 and 4:30 am. We hold-out 10% of the radar data. 
The radar reflectivity data (log scale ) versus the predicted values ( median of the posterior 
predictive distribution for log rainfall) are plotted as points, the 95% prediction intervals 
are plotted as solid-lines. The predictive reflectivity radar values are 97.4% of the time 
within the 95% prediction bands. The values in the natural logarithm scale are truncated 
at —2. The zero rain values for the observed data are plotted as —2. 

time within the 95% prediction bands. Similar results were obtained for the 
other 10 analyses. Pooling over all ten analyses, the coverage probabilities 
are 96% for the radar data and 94% (196/208) for the gage data. We conclude 
our model is well calibrated. 

5. Discussion. In this work we introduce a framework to combine radar 
reflectivity and gage data to obtain instant rainfall maps. We model the 
gage and reflectivity data using a zero-inflated log-Gaussian process, with 
the probability of zero being a function of the latent rainfall process. Most of 
the previous empirical analyses to study the reflectivity-rainfall conversion 
equation ignore the zero-rain values. In this work we use the zero-rain values, 
furthermore, we determine the probability of zero-rain at any given location 
using a spatial logistic model, taking into account the values of the rainfall 
process in a neighborhood of the location of interest. The underlying rainfall 
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process has a nonstationary and asymmetric covariance that explains the 
spatial temporal complex dependency structures. 

The dataset used for our analysis seem to indicate that the standard val- 
ues of the parameters in the reflectivity-rainfall conversion equation might 
not be appropriate for every storm and geographic domain, and that there 
is need to allow the additive bias to change across space. The main differ- 
ences between the approach presented here and previous analyses are the 
treatment of the zero values, the characterization of the storm motions, and 
the fact that the additive bias is allowed to change across space. Our model 
comparisons seem to indicate that these are all important features that lead 
to a better and more accurate rainfall estimated surfaces. 

In this paper we have made an effort to deal with the various problems of 
the rainfall data sources, but there is always more that could be done. We 
discuss here some forward-looking issues to illustrate some of the challenges 
in making the best use of these data. In our modeling framework we have 
tried to characterize the uncertainty in radar and gage data and in the R— 
Ze conversion equation. However, it would be helpful to conduct different 
sub-analysis to study and understand better the different sources of error 
that contribute to the overall uncertainty, in particular, regarding radar mis- 
calibration and attenuation errors, and errors in the R-Ze conversion due to 
microphysical processes. Radar mis-calibration errors tend to provide a con- 
stant bias within the radar coverage range, while errors due to attenuation 
show high correlation along the path, so it could be possible to separate both 
sources of error. In addition, R-Ze conversion errors should be a function 
of different microphysical processes. Thus, a new formulation that separates 
these various errors merits a new exploration. This would provide a better 
understanding of error sources and a way of mitigating them. 

The understanding of error structures and its use in hydrological and me- 
teorological models have enormous potential. Obvious applications are its 
use in probabilistic verification of the precipitation forecast from numerical 
models, and of the precipitation estimates from various remote sensing in- 
struments such as the space-based radiometers. From our analysis we can 
obtain probability maps of exceeding a certain threshold. This information 
could be used to improve the prediction of the stream flow of extreme hy- 
drological events, and that would aid in the preparation for extreme events. 
We have shown here the mean of the predictive posterior distributions which 
shows a central tendency of each predictive field. Numerous predictive fields 
can be simulated from that distribution and could be used as inputs to gen- 
erate a hydrological and meteorological ensemble forecast. Since hydrological 
and meteorological models are nonlinear, the response to the small variation 
of initial inputs is highly unpredictable. Thus, an ensemble forecast based 
on some initial predictive fields (as obtained in this paper) that characterize 
the rainfall variability is essential to characterize extreme events. 
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